Engine: engines/gpt2/rank0.engine
Note: This section tracks only explicit I/O buffers allocated by this program, not TensorRT internal workspace/scratch allocations.
| Tensor | Size (MB) | Approx. 64KB Pages | Approx. Fragmentation (KB) |
|---|---|---|---|
| present.21.key | 2.000 | 32 | 0.000 |
| present.21.value | 2.000 | 32 | 0.000 |
| present.22.key | 2.000 | 32 | 0.000 |
| present.22.value | 2.000 | 32 | 0.000 |
| present.23.key | 2.000 | 32 | 0.000 |
| present.23.value | 2.000 | 32 | 0.000 |
| present.13.key | 2.000 | 32 | 0.000 |
| present.13.value | 2.000 | 32 | 0.000 |
| present.14.key | 2.000 | 32 | 0.000 |
| present.14.value | 2.000 | 32 | 0.000 |
| present.15.key | 2.000 | 32 | 0.000 |
| present.15.value | 2.000 | 32 | 0.000 |
| present.16.key | 2.000 | 32 | 0.000 |
| present.16.value | 2.000 | 32 | 0.000 |
| present.17.key | 2.000 | 32 | 0.000 |
| present.17.value | 2.000 | 32 | 0.000 |
| present.18.key | 2.000 | 32 | 0.000 |
| present.18.value | 2.000 | 32 | 0.000 |
| present.19.key | 2.000 | 32 | 0.000 |
| present.19.value | 2.000 | 32 | 0.000 |
| present.20.key | 2.000 | 32 | 0.000 |
| present.20.value | 2.000 | 32 | 0.000 |
| present.5.key | 2.000 | 32 | 0.000 |
| present.5.value | 2.000 | 32 | 0.000 |
| present.6.key | 2.000 | 32 | 0.000 |
| present.6.value | 2.000 | 32 | 0.000 |
| present.7.key | 2.000 | 32 | 0.000 |
| present.7.value | 2.000 | 32 | 0.000 |
| present.8.key | 2.000 | 32 | 0.000 |
| present.8.value | 2.000 | 32 | 0.000 |
| present.9.key | 2.000 | 32 | 0.000 |
| present.9.value | 2.000 | 32 | 0.000 |
| present.10.key | 2.000 | 32 | 0.000 |
| present.10.value | 2.000 | 32 | 0.000 |
| present.11.key | 2.000 | 32 | 0.000 |
| present.11.value | 2.000 | 32 | 0.000 |
| present.12.key | 2.000 | 32 | 0.000 |
| present.12.value | 2.000 | 32 | 0.000 |
| past_key_values.21.key | 1.996 | 32 | 4.000 |
| past_key_values.21.value | 1.996 | 32 | 4.000 |
| past_key_values.22.key | 1.996 | 32 | 4.000 |
| past_key_values.22.value | 1.996 | 32 | 4.000 |
| past_key_values.23.key | 1.996 | 32 | 4.000 |
| past_key_values.23.value | 1.996 | 32 | 4.000 |
| present.0.key | 2.000 | 32 | 0.000 |
| present.0.value | 2.000 | 32 | 0.000 |
| present.1.key | 2.000 | 32 | 0.000 |
| present.1.value | 2.000 | 32 | 0.000 |
| present.2.key | 2.000 | 32 | 0.000 |
| present.2.value | 2.000 | 32 | 0.000 |
| present.3.key | 2.000 | 32 | 0.000 |
| present.3.value | 2.000 | 32 | 0.000 |
| present.4.key | 2.000 | 32 | 0.000 |
| present.4.value | 2.000 | 32 | 0.000 |
| past_key_values.13.key | 1.996 | 32 | 4.000 |
| past_key_values.13.value | 1.996 | 32 | 4.000 |
| past_key_values.14.key | 1.996 | 32 | 4.000 |
| past_key_values.14.value | 1.996 | 32 | 4.000 |
| past_key_values.15.key | 1.996 | 32 | 4.000 |
| past_key_values.15.value | 1.996 | 32 | 4.000 |
| past_key_values.16.key | 1.996 | 32 | 4.000 |
| past_key_values.16.value | 1.996 | 32 | 4.000 |
| past_key_values.17.key | 1.996 | 32 | 4.000 |
| past_key_values.17.value | 1.996 | 32 | 4.000 |
| past_key_values.18.key | 1.996 | 32 | 4.000 |
| past_key_values.18.value | 1.996 | 32 | 4.000 |
| past_key_values.19.key | 1.996 | 32 | 4.000 |
| past_key_values.19.value | 1.996 | 32 | 4.000 |
| past_key_values.20.key | 1.996 | 32 | 4.000 |
| past_key_values.20.value | 1.996 | 32 | 4.000 |
| past_key_values.5.key | 1.996 | 32 | 4.000 |
| past_key_values.5.value | 1.996 | 32 | 4.000 |
| past_key_values.6.key | 1.996 | 32 | 4.000 |
| past_key_values.6.value | 1.996 | 32 | 4.000 |
| past_key_values.7.key | 1.996 | 32 | 4.000 |
| past_key_values.7.value | 1.996 | 32 | 4.000 |
| past_key_values.8.key | 1.996 | 32 | 4.000 |
| past_key_values.8.value | 1.996 | 32 | 4.000 |
| past_key_values.9.key | 1.996 | 32 | 4.000 |
| past_key_values.9.value | 1.996 | 32 | 4.000 |
| past_key_values.10.key | 1.996 | 32 | 4.000 |
| past_key_values.10.value | 1.996 | 32 | 4.000 |
| past_key_values.11.key | 1.996 | 32 | 4.000 |
| past_key_values.11.value | 1.996 | 32 | 4.000 |
| past_key_values.12.key | 1.996 | 32 | 4.000 |
| past_key_values.12.value | 1.996 | 32 | 4.000 |
| input_ids | 0.000 | 1 | 63.996 |
| logits | 0.192 | 4 | 59.684 |
| past_key_values.0.key | 1.996 | 32 | 4.000 |
| attention_mask | 0.002 | 1 | 62.000 |
| past_key_values.0.value | 1.996 | 32 | 4.000 |
| past_key_values.1.key | 1.996 | 32 | 4.000 |
| past_key_values.1.value | 1.996 | 32 | 4.000 |
| past_key_values.2.key | 1.996 | 32 | 4.000 |
| past_key_values.2.value | 1.996 | 32 | 4.000 |
| past_key_values.3.key | 1.996 | 32 | 4.000 |
| past_key_values.3.value | 1.996 | 32 | 4.000 |
| past_key_values.4.key | 1.996 | 32 | 4.000 |
| past_key_values.4.value | 1.996 | 32 | 4.000 |
Total I/O Bytes: 192.006 MB
Total I/O Pages (approx, 64KB): 3078 (~192.375 MB)
Internal Fragmentation (approx): 377.680 KB
This is a visualization of the virtual address range spanned by your I/O allocations, bucketed into 64KB blocks. It is an approximation, not a true GPU residency map.
Total VRAM: 8192 MB
Each block = 1 second. Height/color = utilization.
Latency (ms): min 2.883, avg 3.216, max 33.485
Avg throughput: 310.94 tok/s
Each block = 1 token. Height/color = latency.
vram_usage.csv — VRAM used/total sampled at 1 Hztoken_latency.csv — per-token enqueue latency + tok/sreport_gpt2_limited.html — this report